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Abstract 

We analyze cross-correlation between runs scored over a time in¬ 
terval in cricket matches of different teams using methods of random 
matrix theory (RMT). We obtain an ensemble of cross-correlation ma¬ 
trices C from runs scored by eight cricket playing nations for (i) test 
cricket from 1877 -2014 (ii)one-day internationals from 1971 -2014 and 
(hi) seven teams participating in the Indian Premier league T20 format 
(2008-2014) respectively. We find that a majority of the eigenvalues 
of C fall within the bounds of random matrices having joint proba¬ 
bility distribution P{xi... ,Xn) = Cjqp Wj^^wiyXj) \xj — Xk\^ where 
w{x) = exp (—A/I6x) and (3 is the Dyson parameter. The cor¬ 
responding level density gives Marchenko-Pastur (MP) distribution 
while fluctuations of every participating team agrees with the uni¬ 
versal behavior of Gaussian Unitary Ensemble (GUE). We analyze 
the components of the deviating eigenvalues and hnd that the largest 
eigenvalue corresponds to an influence common to all matches played 
during these periods. 

PACS numbers: 05.45.Tp, 05.40.-a 

* Corresponding author: nianukalia24@gmail.com 


1 



1 Introduction 


Analyzing correlations among cricket teams of different era has been a topic 
of interest for sports experts and journalists for decades. In this paper we 
study such influence (or interaction) by constructing cross-correlation matrix 
C (l]-j^ formed by runs scored by teams over different time intervals, formally 
called a time series.We consider the time series of batting scores posted per 
innings by a team in all official ICC International Test matches played. Then 
we construct an ensemble of cross-correlation matrices corresponding to Test 
data for that cricket team. We repeat the process for One Day International 
(ODI) and Indian Premier League (IPL) T20 cricket matches. We assume 
the correlations to be random and compare the fluctuating properties of 
C with that of random matrices. Within the bounds imposed by the RMT 
model, fluctuations of C show brilliant agreement with the “universal” results 
of CUE [^-[^, while the level density corresponds to the MP distribution 
This implies that interactions in C are random, or in simple words not 
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governed by any causality principal. However outside the bounds, eigenvalues 
of C show departure from RMT predictions, implying influence of external 
non-random factors common to all matches played during this period. To 
understand this effect, we remove k extreme bands from C and perform the 
Kolmogorov-Smirnov (KS) Test. We observe a better agreement with RMT 
predictions. 

We organize the paper as follows: After a brief description of the data ana¬ 
lyzed in sub-section O , we dehne cross-correlation matrix in sub-section [T^ 


Section 1^ introduces our RMT model along with a brief proof of MP distri 
bution. We analyze our results and its corresponding RMT model in Section 
This is followed by concluding remarks. 


1.1 Data analysed 

We construct three ensembles, corresponding to runs scored in Tests, GDIs 
and Indian Premier League (IPL). 

• The ODI ensemble comprises of cross-correlation matrices constructed 
from runs scored by India, England, Australia, West Indies, South 
Africa, New Zealand, Pakistan and Sri Lanka for all official ICC One 
Day International matches played between 1971 and 2014. For each 
country we have a sequence of runs scored in both home and away 
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matches. An ensemble of fifty one 90 x 90 matrices are constructed 
from the time series data. 

• The Test ensemble comprises of cross-correlation matrices constructed 
from runs scored by India, England, Australia, West Indies, South 
Africa, New Zealand, Pakistan and Sri Lanka. For each country we 
have a sequence of runs scored per innings (each match has a maximum 
of two innings) in both home and away matches. The Test scores 
have been taken for all matches played between England, Australia and 
South Africa between 1877 and 1909 and all official ICC Test matches 
thereafter, till 2014. An ensemble of seventy 90 x 90 matrices are 
constructed from the time series data. 

• The IPL ensemble comprises of cross-correlation matrices constructed 
from runs scored by Chennai Super Kings, Rajasthan Royals, Royal 
Challengers Bangalore, Delhi Daredevils, Kings XI Punjab, Kolkata 
Knight Riders and Mumbai Indians for all official BCCI IPL T20 matches 
played between 2008 and 2014. For each team we have a sequence of 
batting scores posted per match. An ensemble of twenty eight 20 x 20 
matrices are constructed from the time series data. 


1.2 Cross-correlation matrix 

Cross-correlation matrix C is constructed from a given time series X = 
{X (1), X(2),...} by dehning subsequences Xj = {X{i),X{i -|- 1),... ,X{N)} 
and Xj = {X{j),X{j + 1),. .. ,X{N — At)}, separated by a “lag” At = i—j, 
j < i and i,j G N. We then normalize the subsequences by dehning 

h) = (1) 

Finally, cross-correlation matrix C is dehned as 

Q,j = (FW,), (2) 


where and axi are sample mean score and standard deviation of the 
subsequence Xj respectively, and (...) denotes a time average over the period 
studied. This is the correlation coefficient between the subsequences Fj and 
Yj and help us understand the correlation between runs scored by a given 
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team at different time intervals. The matrix elements lie between -1 and 1 
and the matrices so constructed are Hermitian. 

Now, we construct multiple matrices on a single time series, giving rise 
to an ensemble of matrices. Letting = C (as constructed above), we 
construct another matrix by removing hrst N elements of the time series 
considered, and constructing the cross-correlation matrix with the method 
described above. We continue this process of construction till the length of 
the truncated time series becomes less than N. 


2 Random Matrix Model 

Unitary Ensemble of random matrices is invariant under unitary transfor¬ 
mation H —)• W^HW where the ensemble is dehned in the space T 2 G of 
Hermitian matrices and W is any unitary matrix. Also, the various linearly 
independent elements of H, must be statistically independent (^. 

Joint probability distribution function of eigenvalues {xi,X 2 , is 

given by, 

exp I -NI3h 

j<k \ 

where /d = 1, 2 and 4 correspond to orthogonal (OE), unitary (UE) and sym- 
plectic (SE) ensembles respectively and Cn/s is the normalization constant [^. 
We dehne n-point correlation function by 

Rif\xu ■■,Xn) = j dXn+l...J dx N Pn , X n) ■ (4) 

This gives a hierarchy of equations given by 

^Ri{x) j dy+ ^ Ri(x) = 0, (5) 

J [x-y) w{x) 

where 

w{x) = x'^^“exp[—A^/dfex]. (6) 


PNfi{Xi,..,XN) = Cn0- 


N 


Xh 


\Xi 


Xk\ 


(3) 
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Figure 1. Level Density for averaged Test data with fc = 5 . The solid line refers to 
Marchenko-Pastur result § and the dashed line refers to the finite N result, obtained by 
the polynomial method described in Section]^ Here, a = 2.75, b = 3.535, X- = 0.339601 
and = 1.78204 in The largest eigenvalue is circled towards the end of the spectrum. 


We solve the integral equation using the resolvent 

G(z} = [ !^dy, (7) 

J z-y 

which satishes 

G{x + i0)= f dy — mRiix). ( 8 ) 

J x-y 

Multiplying Eq.(|^ by x/{z — x) and integrating over x we get after some 
elementary calculation 

p(x) = ^^ = Ty(j-.Y_)(.Y+-x)i .Y_<i<Ay, (9) 

IS TTX 

= 0, otherwise. 


where 


a + 1 \/2a + 1 

^ ± 


b b 

For finite N, following Dyson-Mehta method [^, 


( 10 ) 


we use 


p{x) 


1 

N 


N-l 


5^0? (a:), 

j=0 


(j)j{x) = ^/w{x)Pj{x), 


( 11 ) 
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where Pj{x) are orthonormal polynomials which satisfy 



Pj{x)Pk{x)w{x)dx 


^j,ki 


j, k eN. 
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To understand the correlation in the system, we hrst need to unfold the 
eigenvalues to eliminate global effect over fluctuation. The sequence of scores 
for each country is unfolded independently. The corresponding unfolded 


eigenvalues yk are given by 11 


r^k 

yk= p{x)dx, 
Jx. 


(13) 


and the mean spacing of the unfolded eigenvalues yk is 1. We perform un¬ 
folding using both (i) the theoretical level density (|^ and (ii) numerical 
integration of the data and obtain the best-£t over the integrated density. 




(a) Theoretical unfolding (b) Numerical unfolding 

Figure 2. Nearest neighbour spacing distribution for mixed and averaged Test data ob¬ 
tained via numerical and theoretical unfolding (using Marchenko-Pastur result with 
a = 2.75, b = 3.535, X- = 0.339601 and = 1.78204). The solid line refers to spacing 
distribution of experimental data with fc = 5, the dotted line refers to GUE result and the 
dashed line refers to the Poisson case. 


For {S', I S'* = t/i+i — yi}, Si = Si/D where yi denote successive unfolded 
levels and D is the average spacing, the level spacing distribution p{s)ds is 
dehned as the probability of finding an s, between s and s + ds [^ . For no 
correlations between the levels, we have the Poisson distribution 

p{s) = exp[-s], (14) 


6 
































( 15 ) 


while for GUE, we get the Wigner’s surmise 


p{s) 


32s^ 

^-exp 



We consider 8 sequences of eigenvalues for Test data obtained by ensem¬ 
ble averaging over each country. We unfold these sequences individually and 
average over the 8 sequences of spacings. The result shows remarkable agree¬ 
ment with GUE predictions (Fig. |^. Upon mixing of the eigenvalues of the 
Test data we observe Poisson distribution (Fig. |^. 



n 


Figure 3. Number variance for the averaged and mixed Test data obtained via numerically 
unfolding over the spectra. The solid line refers to GUE result (181 and the dashed line 
refers to Poisson case. The figure plots three cases: (i) Averaged Test data with k = 5 
extreme diagonals removed (ii) Mixed Test data with fc = 5 extreme diagonals removed 
and (iii) Mixed Test data for the entire spectrum when no diagonals are removed from the 
matrices. 


Another statistic considered is the linear statistic or the number variance. 
For Uk unfolded levels in consecutive sequences of length n, we define the 
moments 11 , 

N 




= N 


n 




( 16 ) 


k=l 


where N is the number of sequences considered, each of length n covering 
the entire spectrum. Then the number variance S^(n) is given by 


S2(n) = M2(n) -nl 


(17) 
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For GUE, number variance is given by [^, 

S^(n) = ^ (ln(27m) + 7 + 1) , (18) 

where 7 is the well known Euler constant. Number variance is known to be 
very sensitive for larger values of n on account of spectral rigidity. Fig 
shows a very good agreement of the experimental number variance result of 
the Test data to that of the GUE result for cases when A; = 0 and k = 5 
extreme diagonals are removed from both ends of the matrices involved in 
calculation. 



Figure 4. The Dyson-Mehta least squares statistic for the averaged and mixed Test data 
with k = 5 extreme diagonals removed from both ends of the matrices involved in calcu¬ 
lation obtained via numerically unfolding the spectrum . The solid line refers to the GUE 
result (20) and the dashed line refers to the result for the Poisson case (21). 


The other statistics considered is the Dyson-Mehta least square statistic 
or the spectral rigidity statistic which measures the long-range correlations 
and irregularity in the level series in the system by calculating the least 
square deviation of the unfolding function from a straight line y = aE + b 
over different ranges L. The statistic A(L) for L = L 2 — Li is given by the 
integral, 

A(L) = y [ \n{E) -aE- hfdE, (19) 

L Jli 

where N{E) is the unfolding function. The mean value of the statistic for 










the GUE case is given by [^, 

(A) = ^(ln(27rL) +7-5/4). 
For Poisson case, the least square statistics is given by 



( 20 ) 


( 21 ) 


3 Analysis 

The problem that one encounters in analysis of such data are 

1. The hnite length of time series available introduces measurement noise. 

2. A bigger time series will introduce more contributions from non- 
random events which will affect the “universality” result but will provide 
information about the correlations among different time series. 

We study the RMT model dehned by Eq.(|^. We obtain MP distribution 
(|^ for the level density as A —)■ cx). We observe that the level density of 
eigenvalues of C in the bulk shows a remarkable agreement with the MP 
distribution for all Test, GDI and IPL data. However, some large eigenval¬ 
ues exist outside the bounds [A_, A+]. To ensure that these eigenvalues are 
not due to hnite N effect, we obtain level-density for hnite N. For this, 
we develop the corresponding orthonormal polynomials using Gram-Schmidt 
method and using Eq.([IT|) for N = 10 obtain the level density and compare 
that with ensembles of cricketing data. (Fig. [^. We observe that the large 
eigenvalues still remain outside the bounds. 

The next question is if these large eigenvalues non random, in which case 
our RMT model will not only show disagreement with the level density but 
also “spoil” the RMT predictions. To verify this, we make RMT analysis 
over the entire spectrum and compare its results with the truncated sparse 
matrix, which removes the large eigenvalues. KS test shows that our level 
density and spacing distribution analysis is considerably hampered by the 
presence of these large eigenvalues, thereby conforming the existence of non 
random long range correlations. 

To track the level of non-randomness, we remove k, [k << N) extreme 
bands out of 2N — 1 bands of the N x N matrices C and perform the KS test. 
We perform numerical unfolding over the eigenvalues where the integrated 
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density of states are fitted with a polynomial. For ODI, where iV = 90 we 
obtain a p-valne of 0.640311 for the fnll spectrnm and a p-valne of 0.9025 
for spectrnm of the matrix with k = 15. For the Test data (again N = 90), 
we obtain a p-valne of 0.49 for nnfolding the fnll spectrnm and a p-valne of 
0.855394 when nnfolding the spectrnm of the matrix with k = 5.Thns by 
creating a sparse matrix, which removes the large eigenvalnes, onr resnlts 
converge to RMT predictions by ~ 30%. This proves the existence of non 
randomness in the system introdnced by elements Cij, with \i — j\ ~ N. 
We observe that as we increase the valne of k, the largest eigenvalne in the 
spectrnm gradnally rednces and converges towards the bonnd imposed by 
the RMT model as shown in Fig. We then do theoretical nnfolding on the 
new data and observe similar agreement on KS test. 

For the nnmber variance calcnlation, we hrst nnfold the spectrnm and 
calcnlate nnmber variance both within bonnds and over the entire spectrnm. 
The former gives a good agreement with GUE while the latter, as expected, 
shows deviation, pointing towards the presence of large eigenvalnes which are 
dne to correlation coefficients between rnns scored over a long time gap. 



Figure 5. Largest eigenvalue in the averaged spectrum vs. k for the Test, ODI and IPL 
data 


Finally, theoretical nnfolding is performed over the spectra nsing Eqs.(13) 
and ([^. The MP distribntion parameters for the Test data (k = 5) are given 
in Fig. 1^ For the ODI data {k = 15), we have a = 2.475, b = 3.15, 
X_ = 0.328806 and = 1.87754 as the optimal parameters for Eq. 
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Lastly, we mix levels obtained from the time series of all teams and observe 
a Poisson distribution (Fig. [^. 


4 Conclusion 


From the statistical analysis of test, ODI and IPL data, we conclude that 
the eigenvalues of cross-correlation matrices display GUE universality. The 
Test and ODI data are the only sets of data we found to be large enough to 
give results of the nature produced in this paper. Thus even though the T20 
results of the BCCI IPL matches are also considered the small N effect is 
visible in our GUE results. 

We observe Wigner surmise when we study the ensembles of different 
countries (in tests and ODI s)/teams (IPL) separately. However, upon mix¬ 
ing the data of all countries, we get Poisson statistics, both for spacing and 
number variance. Here we may recall that while studying nuclear data statis¬ 


tics 12 , eigenvalues with same spin show GOE but mixed data gives Poisson. 


To ensure that the large eigenvalue which lies outside the bounds are not 
due to the size of the matrices, we obtain the level density using the poly¬ 
nomial method for hnite N. We observe that the large eigenvalues were still 
lying well outside the bounds. Also while numerical unfolding over the whole 
spectra (and not under the MP bound), we observe that the number variance 
show departure from GUE. However, by removing the long-range interaction 
terms from C, we observe a better agreement with RMT predictions, both 
for level density as well as spacing distribution and number variance. 

We believe that eigenvalues close to the upper bound still maintains ran¬ 
domness and any deviation is due to temporal effect. For example, scores 
getting affected due to a sudden burst of performance of an individual player 
over a tournament or bilateral series. However, the larger eigenvalues are 
probably caused due to more stable, non random influence like the effect on 
cricketing performance due to the advent of new technology. However this 
needs a thorough investigation. We wish to come back to this in a later 
publication. 


Acknowledgement 

We acknowledge ESPN Gricinfo for providing us with the cricket data. 


11 



References 


[1] Vasiliki Plerou, Parameswaran Gopikrishnan, Bernd Rosenow, Luis 
A. Nunes Amaral, Thomas Guhr, and H. Eugene Stanley. Random 
matrix approach to cross correlations in hnancial data. Phys. Rev. E, 
65:066126, Jun 2002. 

[2] Tayeb Jamali, Hamed Saberi, and G.R. Jafari. Fractional gaussian noise: 
a random-matrix-theory inspired perspective. 2013. 

[3] Akihiko Utsugi, Kazusumi Ino, and Masaki Oshikawa. Random matrix 
theory analysis of cross correlations in hnancial markets. Phys. Rev. E, 
70:026110, Aug 2004. 

[4] Laurent Laloux, Pierre Gizeau, Jean-Philippe Bouchaud, and Marc Pot¬ 
ters. Random matrix theory and hnancial correlations. Int. J. Theor. 
Appl. Einance 3, page 391, 2000. 

[5] G. Biely and S. Thurner. Random matrix ensembles of time-lagged 
correlation matrices: Derivation of eigenvalue spectra and analysis of 
hnancial time-series. ArXiv Physies e-prints, September 2006. 

[6] G. W. J. Beenakker. Random-matrix theory of quantum transport. Rev. 
Mod. Phys., 69:731-808, Jul 1997. 

[7] M.L. Mehta. Random Matriees. Pure and Applied Mathematics. Elsevier 
Science, 2004. 

[8] Sangata Ghosh. Long-range interactions in the qnantnm many-body 
problem in one dimension: Gronnd state. Phys. Rev. E, 69:036118, Mar 
2004. 

[9] Sangata Ghosh. Skew-orthogonal polynomials and Random matrix the¬ 
ory. GRM Monograph Series. AMS, 2009. 

[10] V A Marcenko and L A Pastnr. Distribntion of eigenvalnes for some 
sets of random matrices. Mathematies of the USSR-Sbornik, 1(4):457, 
1967. 

[11] Jac Verbaarschot. Topics in random matrix theory. nrl: 
http://tonic.physics.snnysb.edn/ verbaarschot/lectnre/. 


12 



[12] R. U. Haq, A. Pandey, and O. Bohigas. Fluctuation properties of nu¬ 
clear energy levels: Do theory and experiment agree? Phys. Rev. Lett., 
48:1086-1089, Apr 1982. 


13 



